gh-133968: Use private unicode writer for json#133832
Conversation
|
What's the point? This just adds more maintenance if we make changes to how |
|
They might have caused a performance regression compared to 3.13: faster-cpython/ideas#726 |
ZeroIntensity
left a comment
There was a problem hiding this comment.
I don't think this is a good idea.
jsonoptimizations have been rejected in the past--useujsonor something like that if performance is critical.- If we change how
PyUnicodeWriterworks, it adds more maintenance, especially if we use this as precedent for doing this elsewhere. - We should aim for optimizing
PyUnicodeWriteras a broader change, not speed up each individual case by using the private API.
vstinner
left a comment
There was a problem hiding this comment.
Would it be possible to only replace PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString()? Do you get similar performance in this case?
|
See also #133186. |
|
If |
I chose to not expose it since it generates an invalid string if the input string contains non-ASCII characters. But yeah, maybe we should expose it. The function only validates the input string in debug mode for best performance. |
Maybe, but my current benchmark has too much overhead to measure this accurately. I'll have to rewrite it first. I hope we can figure out how to get the performance of the public API very close to the private one, such that everyone feels comfortable using it. |
|
I updated the benchmark, but I don't understand why:
Does this have something to do with the |
The private API doesn't enable overallocation by default. |
|
I replaced PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString() in Modules/_json.c and ran a benchmark:
Benchmark hidden because not significant (15): encode 100 integers, encode 100 floats, encode 100 "ascii" strings, encode ascii string len=100, encode escaped string len=128, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode Unicode string len=1000, encode 10000 "ascii" strings, encode ascii string len=10000, encode escaped string len=9984, encode Unicode string len=10000 I built Python with Benchmark code: Detailsimport json
import pyperf
runner = pyperf.Runner()
for count in (100, 1_000, 10_000):
runner.bench_func(f'encode {count} booleans', json.dumps, [True, False] * (count // 2))
runner.bench_func(f'encode {count} integers', json.dumps, list(range(count)))
runner.bench_func(f'encode {count} floats', json.dumps, [1.0] * count)
runner.bench_func(f'encode {count} "ascii" strings', json.dumps, ['ascii'] * count)
text = 'ascii'
text *= (count // len(text) or 1)
runner.bench_func(f'encode ascii string len={len(text)}', json.dumps, text)
text = ''.join(chr(ch) for ch in range(128))
text *= (count // len(text) or 1)
runner.bench_func(f'encode escaped string len={len(text)}', json.dumps, text)
text = 'abcd€'
text *= (count // len(text) or 1)
runner.bench_func(f'encode Unicode string len={len(text)}', json.dumps, text) |
|
I also ran my benchmark on this PR:
Benchmark hidden because not significant (1): encode Unicode string len=1000 |
Yeah, but both the old code and the public API do, so it's not that. (And you really don't want to turn it off) Details
|
|
This line is inefficient for exact string instances ( cpython/Objects/unicodeobject.c Line 13936 in 86c1d43
|
|
Here's the comparison with a minimal PR:
|
Merged. I confirmed with two benchmarks that this small optimization makes a big difference on some use cases such as encoding short strings in JSON. |
Ok, I created #133973 to add PyUnicodeWriter_WriteASCII(). |
If that's merged we would use this aproach in 3.14, right? |
|
@vstinner, it looks like the regression in |
Are you sure about that? The implementation uses a freelist which avoids the heap allocation in most cases. |
Would you mind to create a separated PR just for that? |
|
Are the benchmarks creating an unrealistic number of concurrent writers? That would starve the freelist and create some allocation overhead, but only on the benchmarks. |
You're right, it does seem to be using the only entry of the freelist. (I disabled the malloc to check) |
|
I suggest closing this PR. It's not worth it anymore (according to the benchmark below) and I prefer to stick to the public C API. I made two small optimizations in the public
With these optimizations, it seems like this PR is less appealing. I ran a benchmark to compare this PR to the current main branch:
The best speedup is 1.07x faster for "encode 10000 integers". The worst slowdown is 1.10x slower for "encode Unicode string len=10000". Overall, the impact is "1.00x faster" which is not impressive. |
|
Hum. I would be interested by a change which would just remove _PyUnicodeWriter_IsEmpty(), without touching WriteUTF8/WriteASCII calls. |
According to the pyperformance benchmark, json_loads is still 10% slower because of the freelist. And after I've updated the PR, it will only be using the public API. |
|
Done. Decoding empty strings is now 17% faster. Annoyingly, decoding strings with escapes is 7% slower. |
|
I merged your change, thanks. |
|
This still needs to be backported. |
|
Hm, do we backport performance improvements? |
|
We backported the other fixes for the performance regression. See the issue. |
|
I thought the introduction of |
|
I would prefer to not backport this change. |
pyperformance (with
--enable-optimizationsand--with-lto)jsonyx-performance-tests (with
--enable-optimizationsand--with-lto)